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Abstract The ordered weighted t\ (OWL) norm is a newly developed generalization of the Octogonal 
Shrinkage and Clustering Algorithm for Regression (OSCAR) norm. This norm has desirable statistical 
properties and can be used to perform simultaneous clustering and regression. In this paper, we show how 
to compute the projection of an n-dimensional vector onto the OWL norm ball in 0(nlog(n)) operations. 
In addition, we illustrate the performance of our algorithm on a synthetic regression test. 


1 Introduction 


Sparsity is commonly used as a model selection tool in statistical and machine learning problems. For 
example, consider the following Ivanov regularized (or constrained) regression problem: 


minimize -11 Ax 

iER" 2 


b\\ 2 


subject to: ||x||o < £. 


( 1 . 1 ) 


where m, n > 0 are integers, £ > 0 is a real number, A £ R mx " and b £ R m are given, and ||x|| 0 is the number 
of nonzero components of a vector x £ R". Solving (1.1) yields the “best” predictor x with fewer than e 
nonzero components. Unfortunately, (1.1) is nonconvex and NP hard [12]. Thus, in practice the following 
convex surrogate (LASSO) problem is solved instead (see e.g., [6]): 


minimize -||Ax — 6|| 2 

£C£R n 2 


subject to: ||x||i < e 


( 1 . 2 ) 


where ||x||i = X7=i \ x i\- 

Recently, researchers have moved beyond the search for sparse predictors and have begun to analyze 
“group-structured” sparse predictors [1]. These models are motivated by a deficiency of (1.1) and (1.2): they 
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yield a predictor with a small number of nonzero components, but they fail to identify and take into account 
similarities between features. In other words, group-structured predictors simultaneously cluster and select 
groups of features for prediction purposes. Mathematically, this behavior can be enforced by replacing the 
(q and i\ norms in (1.1) and (1.2) with new regularizers. Typical choices for group-structured regularizes 
include the Elastic Net [19] (EN), Fused LASSO [15], Sparse Group LASSO [14], and Octogonal Shrinkage 
and Clustering Algorithm for Regression [5] (OSCAR). The EN and OSCAR regularizers have the benefit of 
being invariant under permutations of the components of the predictor and do not require prior specification 
of the desired groups of features (when a clustering is not known a priori). However, OSCAR has been shown 
to outperform EN regularization in feature grouping [5,18]. This has motivated the recent development of 
the ordered weighted norm [4,16] (OWL) (see (2.1) below), which includes the OSCAR, and norms 
as a special case. 

Related work. Recently, the paper [17] investigated the properties of the OWL norm, discovered the 
atomic norm characterization of the OWL norm, and developed an 0(n log(n)) algorithm for computing its 
proximal operator (also see [4] for the computation of the proximal operator). Using the atomic characteriza¬ 
tion of the OWL norm, the paper [17] showed how to apply the Frank-Wolfe conditional gradient algorithm 
(CG) [8] to the Ivanov regularized OWL norm regression problem. However, when more complicated, and 
perhaps, nonsmooth data fitting and regularization terms are included in the Ivanov regularization model, 
the Frank-Wolfe algorithm can no longer be applied. If we knew how to quickly project onto the OWL norm 
ball, we could apply modern proximal-splitting algorithms [7], which can perform better than CG for OWL 
problems [17], to get a solution of modest accuracy quickly. Note that [17] proposes a root-finding scheme 
for projecting onto the OWL norm ball, but it is not guaranteed to terminate at an exact solution in a finite 
number of steps. 

Contributions. The paper introduces an 0(nlog(n)) algorithm and MATLAB code for projecting onto 
the OWL norm ball (Algorithm 1). Given a norm / : R" — ► R + , computing the proximal map 

pro x f (z) := argmin f(x) + ^||x - 2 :|| 2 

ccgR" ^ 


can be significantly easier than evaluating the projection map 

P{xen--\f(x)<e}(z) ■= argmin -\\x-z\\ 2 . 

f(x)<s * 


(1.3) 


In this paper, we devise an 0(nlog(n)) algorithm to project onto the OWL norm ball that matches the 
complexity (up to constants) of the currently best performing algorithm for computing the proximal operator 
of the OWL norm. The algorithm we present is the first known method that computes the projection in 
a finite number of steps, unlike the existing root-finding scheme [17], which only provides an approximate 
solution in finite time. In addition, using duality (see (2.4)) we immediately get an 0(n log(n)) algorithm 
for computing the proximity operator of the dual OWL norm (see (2.3)). 

The main bottleneck in evaluating the proximity and projection operators of the OWL norm arises from 
repeated partial sortings and averagings. Unfortunately, this seems unavoidable because even evaluating 
the OWL norm requires sorting a (possibly) high-dimensional vector. This suggests that any OWL norm 
projection algorithm requires 12 (n log(ri)) operations in the worst case. 

Organization. The OWL norm is introduced in Section 2. In Section 2.1, we reduce the OWL norm 
projection to a simpler problem (Problem 2.1). In Section 2.2, we introduce crucial notation and properties 
for working with partitions. In Section 3, we introduce the 6 alternatives (Proposition 3.1), which directly lead 
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to our main algorithm (Algorithm 1) and its complexity (Theorem 3.2). Finally, in Section 4, we illustrate 
the performance of our algorithm on a synthetic regression test. 


2 Basic Properties and Definitions 

We begin with the definition of the OWL norm. 

Definition 2.1 (The OWL Norm) Let n > 1 and let w £ R" satisfy wi > u >2 > • • • w n > 0 with ai / 0. 

Then for all z £ R", the OWL norm fl. w : R n —>■ R + is given by 

n 

f 2 ?J ,(xj .— ^ ^ Wj |x|p| ( 2 . 1 ) 

i =1 

where for any x £ R", the scalar | a; | ^ is the z-th largest element of the magnitude vector \x\ := (|xi |,..., \x n \) T 

For all e > 0, let B{w,e) := {x £ R" | f2 w (x) < e} be the closed OWL norm ball of radius e. 

Notice that when w is a constant vector, we have f2 w = iui|| • ||i. On the other hand, when w\ = 1 and 
Wi = 0 for * = 2,.. . ,n, we have f2 w = Wi|| • ||oo. Finally, given nonnegative real numbers and /i 2 , for all 
i £ {1,..., n}, define Wi = + ^{n — i). Then the OSCAR norm [4] is precisely: 

f2 w {x) = ArilMli + /x 2 y^max{|xi|, \xj\}. (2.2) 

i<j 

Note that f2 w was originally shown to be a norm in [4, 16]. The paper [16] also showed that the dual norm 
(in the sense of functional analysis) of fi w has the form 

17* (x) = max{rj||x(j)|| 1 | i = 1,.. .,n} (2.3) 

where x £ R n and for all 1 < j < n, Tj = (^i=i w :i ) anc ^ x (j) e ^ is a vector consisting of the j largest 

components of x (where size is measured in terms of magnitude). One interesting consequence of this fact is 
that for all 7 > 0 and z £ R", we have (from [2, Proposition 23.29]) 

P rox 7fi , (z) := ar|min|f7*(x) + ^||x- z || 2 j = . (2.4) 

Thus, Algorithm 1 (below) also yields an 0(n log(?r)) algorithm for evaluating prox 7j7 » (z). 

2.1 A Simplification of the OWL Norm Projection Problem 

The following transformation (which is based on [17, Lemmas 2-4]) will be used as a preprocessing step 
in our algorithm. For convenience, we let 0 : R” x R" —► R n denote the componentwise vector product 
operator. Finally, for any z £ R", let sign(z) £ {—1,1}™ be the componentwise vector of signs of z (with the 
convention sign( 0 ) = 1 ). 
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Proposition 2.1 (Problem Reduction) Let z £ R", and let <3(|z|) be the permutation matrix that sorts 
\z\ to be in nonincreasing order. Then 

Pb(w,s)(z) = sign(z) © Q{\z\) T Pc(w,e)nr(Q(\z\)\z\) 
where £{w, e) := {x £ R" | (w, x ) < e} and T := (a: £ R' 1 | X\ > x -2 > • • • x n > 0}. 

Proof Note that fi w (sign(z) © Q(\z\)x) = I2 w (x) for all x £ R". Thus, 

1 2 

Pb(w, s )(Q(\z\)\z\) = Ps(w,e) (sign(z) © Q{\z\)z) = argmin - ||sign(z) © Q{\z\)z - z|| 

fi w (x)<£ ^ 

1 2 

= arg min - ||z - sign(z) © Q(|z|) T a:|| = sign(|z|) © Q{\z\)P B ( w , e ){z). 

f2 w (x)<£ ^ 

Thus, we have shown that for general vectors z £ R”, we have Pb(w,e){z) = sign(z)©Q(|z|) T Pg( Wj£ )(Q(|z|)|z|). 
Finally, the result follows from the equality P B ( w ,e){Q(\ z \)\z\) = Pc(w,e)nr(Q{\ z \)\z\). 

Thus, whenever z £ T, projecting onto the OWL norm ball is equivalent to projecting onto the set 
intersection £(w, e) 0 T: 


1 __, 

Pb(w,e){ z ) = arg min -||x — z || 2 subject to: w i x i © £ and Xi > x^ > • ■ ■ x n > 0. 

ieR" 2 

l=L 

Finally, we make one more reduction to the problem, which is based on the following simple lemma. 
Lemma 2.1 Let z,w £ T and suppose that w ^ 0. If (z,w) < e, then P B (w,e){ z ) = z. Otherwise, 
( Pb(w,e)(z),w) = e. 

We arrive at our final problem: 

Problem 2.1 (Reduced Problem) Given z £ T such that ( z,w) > e, find 


x 


* 


argmin -\\x - z\\ 2 
xen n ^ 


n 

subject to: WiXi = £ and x\ > X 2 > • • • x n > 0 

z — 1 


(2.5) 


Now define H(w,£) = {x £ R” | (w,x) = e}. Then x* = PH(w,e)nr( z )- 

The following proposition is a straightforward exercise in convex analysis. 

Proposition 2.2 (KKT Conditions) The point x* satisfies Equation (2.5) if, and only if, there exists 
X* £ R++ and a vector v* £ R" such that 

1. x* £ T 

2. v* ( x* — x* +1 ) = 0 for 1 < i < n and v^x n = 0; 

3. x* = Zi — X*Wi + v* — v*_ 1 for 1 < i < n where Vq := 0; 

4- and (x*, w) = e. 

We now record the solution to (2.5) in the special case that w is the constant vector. 

Proposition 2.3 (Projection Onto the Simplex [9]) Let n > 0 and let A(k,ti) denote the simplex 
{x £ R” | 0 < x < k and E"=i Xi = K l- Let z,w £ T and suppose that w 0. In addition, suppose that 
w\ = u >2 = ■ • • = w n . Then x* = PA(e/wi,n)i z ) the solution to Problem (2.5). In other words, we can 
replace the constraint x £ T with x £ RIL in Problem (2.5). Furthermore, x* = maxjz — A, 0} where 


X := 


E*Li ^ 


K := max 


Ef=i z i - e/wi 



K 


and 


k 
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2.2 Partitions 

Define V n to be the set of partitions of {l,...,n} built entirely from intervals of integers. For example, 
when n = 5, the partition Q := {{1, 2}, {3}, {4,5}} is an element of P 5 , but Q' := {{1,3}, {2,4, 5}} is not an 
element of V 5 because {1,3} and {2,4, 5} are not intervals. For two partitions Gi. G 2 £ V n , we say that 

Gi =4 G 2 if for all G\ £ Gi, there exists G 2 £ G 2 with G\ CG 2 . 

Note that if Gi =4 G 2 and G 2 -4 Gi, then Gi = f? 2 - In addition, we have the following fact: 

Lemma 2.2 Let G\,Gi £ V n . If Gi =4 G 2 and \Gi\ = {Gil, then Gi = Gi- 

Suppose that we partition a vector 2 £ R" into g maximal groups of nondecreasing components 

Z = {%1 j • • • > ) ^ni+l 5 * • * ) Ai 2 > • • ■ ) • • ■ > z n a ) 

'-.-''-v-' v_:- v -✓ 

Gi(z) G 2 {z) Gg (z) 

where z nj > z n .+ 1 for all 1 < j < g — 1 , and inside the each group, 2 is a nondecreasing list of numbers (i.e., 

Zk < Zk+ 1 whenever k, k + 1 £ Gj(z) for some j £ {1, • • • , <?}). Note that g can be 1, in which case we let 

n 0 = 1. We let 


G(z) :={G 1 (z),...,G g (z)}£V n . (2.6) 

For example, for 2 := (1,4,5,1, 3) T , we have G{z) = {{1, 2,3}, {4, 5}}, g = 2, Gi( 2 ) = {1,2,3}, and 
G 2 ( 2 ) = {4, 5}. Note that when 2 £ T, the vector 2 is constant within each group. 

For simplicity, whenever x* is a solution to (2.5), we define 

G*:=G(x*)- ( 2 . 7 ) 

Finally, for simplicity, we will also drop the dependence of the groups on 2 : Gi := Gi(z). 

For any vector 2 £ R" and any partition G = {G\,. .. ,G g } £ V n , define an averaged vector: for all 
j = 1 ,... ,g and i £ Gj , let 

[zg)i := i„ I 'y ^ Zk- (2-8) 

|tjjl keGj 

For example, for 2 := (1,4, 5,1, 3) T and G := {{1, 2 }, {3,4}, {5}}, we have zg = (5/2, 5/2,3, 3, 5) T . Note that 
zg £ T whenever 2 £ T. 

The following proposition will allow us to repeatedly apply transformations to the vectors 2 and w without 
changing the solution to (2.5). 

Proposition 2.4 (Increasing Partitions) Let z,w £T and suppose that hi / 0. 

1. Suppose that A* > A (where A* is as in Proposition 2.2). Then we have 

G(z ) =$ G(z - A w) = G(z g{z _ Xw) ) 4 G*- 

2. We have x* = Pf/(tu s ,e)nr(- 2 6 ) whenever G =4 G*■ 

Proof See Appendix A. □ 
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3 The Algorithm 


The following proposition is the workhorse of our algorithm. It provides a set of 6 alternatives, three of 
which give a solution when true; the other three allow us to update the vectors z and w so that G(z) strictly 
decreases in size, while keeping x* fixed. Clearly, the size of this partition must always be greater than 0, 
which ensures that our algorithm terminates in a finite number of steps. 


Proposition 3.1 (The 6 Alternatives) Let z,w G T ■ Suppose that ro/0, that (w, z) > e, and w = Wg^ z ) ■ 
Let 

r := min / —-LhL \ i = 1,... ,n — ll 

l Wi - w i+ 1 J 

where we use the convention that 0/0 = oo. Define 


Ao 




{i\zi>z n } ZiWi 


— £ 


E 


{i|z»>z„} w i 


and 


(A w ) ~ g 

HI 2 


Then A* > Ai (where A* is as in Proposition 2.2). 

Let n' := min ({k | Zk — Ao Wk < 0} U {n + 1}). Then one of the following mutually exclusive alternatives 
must hold: 


1. If r = oo, we have x* = PA(e/w 1 ,n)(z). 

2. If Ai > r, then A* > Ai > r. 

3. If Ai < r < oo and z n — Ai w n > 0, then x* = z — Xiw. 

4- If Ai < r < oo, z n — \\w n < 0 and Ao > r, then A* > Ao > r. 

5- If Ai < r < oo, z n — \\w n < 0, Ao < r, and n! < n with z n i = z n , then x* = ma x{z — Ao w, 0}. 

6- If Ai < r < oo, z n — \\w n < 0, Ao < r, and n! < n with z n > ^ z n , then Go =4 G* where (?o = {G € G(z) \ 

max(G) < n'} U {{n',..., n}}. 

7. It cannot be the case that Ai < r < oo, z n — \\W n < 0, Ao < r, and n’ = n + 1. 

In addition, whenever A* > A > r, we have G{z — Xw) =4 G* and \G(zg^ z _x w ))\ < \G{z)\ — l. Similarly, when 6 
holds, we have Go € V n , G(z) 4 G o = G{zg 0 ) G*, and \G{zg 0 )\ < \G(z)\ — 1. In particular, ifG(z) = G* , 
then at least one of steps 1, 3, and 5 will not fail. 

Proof See Appendix B. □ 

We are now ready to present our algorithm. It repeatedly transforms the vectors z and w after checking 
whether Proposition 3.1 yields a solution to (2.5). Note that we assume the input is sorted and nonnegative. 
Thus to project onto the OWL ball with Algorithm 1, the preprocessing in Proposition 2.1 must be applied 
first. Please see Appendix C for an example of Algorithm 1. 

Algorithm 1 (Algorithm to solve (2.5)) Let z £ T, w £ T\{0}, and s G R++- 
Initialize: 


1. W G- Wg(z); 

Repeat: 

1. Computation: 

(a) r <— min j I * = 1, • • •, n — 1 j (where 0/0 = oo); 
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(b) Define 

. E{i\ Zi >z n } z i w i ~ £ 

E { i| Zi > Zn} wf 

(c) n' 4- min ({k \ Zk - A 0 Wk < 0} U {n + 1}); 

(d) Go(z) <-{Ge G(z) | max(G) < n'} U {{n', .. .,n}} 

2. Tests: 

(a) If ( z , w) < e, set 

• 5k . 

1. X 4— Z. 

Exit; 

(b) If r = oo, set 

i- ^ ^ Pa( e/ Wl , n ){zfi 
Exit; 

(c) If Ai > r, set 

i. Z 4— Zg( z _\ lWo }; 

H. W 4- Wg( z _XiW 0 ); 

Go to step 1. 

(d) If X\ < r < oo and z n — \\w n > 0, set 

i. x* 4— z — Ai w 
Exit; 

(e) If Xi < r < oo, z n — X\w n < 0 and Ao > r, set 

i. Z i Zg; z — \ 0 w 0 ) > 

H. W 4 ^Q(z—\owo) j 
G o to step 1. 

(f) If Ai < r < oo, z n — Ai w n < 0, Ao < r, and n! < n with z n > = z n , set 

i. x* 4— max{z — Xqw, 0}. 

Exit; 

(g) If Ai < r < oo, z n — X\w n < 0, Ao < r, and n! < n with z n > z n , set 

i- z 4— zg 0 ; 

ii. w -0- wg 0 ; 

Go to step 1. 

Output: x*. 

With the previous results, the following theorem is almost immediate. 

Theorem 3.1 Algorithm 1 converges to x* in at most n outer loops. 

Proof By Proposition 2.4, x* = P H (w,e)nT ( z ) = -Pf/(™ ew , e )nr(~e(z)) = PH(w gM ,e)nr( z ), so we can assume 
that w = wg ( 2 ) from the start. Furthermore, throughout this process z and w are updated to maintain that 
G(z) =4 G*, and so we can apply Proposition 3.1 at every iteration. In particular, Proposition 3.1 implies 
that during every iteration of Algorithm 1, z and w must pass exactly one test. If tests 2a, 2b, 2d, or 2f are 
passed, the algorithm terminates with the correct solution. If tests 2c, 2e, or 2g are passed, then we update z 
and w, and the set G{z) decreases in size by at least one. Because 1 < \G(z)\ < n , this process must terminate 
in at most n outer loops. □ 

The naive implementation of Algorithm 1 has worst case complexity bounded above by 0(n 2 log(n)) 
because we must continually sort the ratios in Step la and update the vectors z and w through averaging 


and 


Ai 4 — 


(z,w) £ 
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in Algorithm 1. However, it is possible to keep careful track of Ao,Ai ,r,z, and w and get an 0(nlog(n)) 
implementation of Algorithm 1. In order to prove this, we need to use a data-structure that is similar to a 
relational database. 


Theorem 3.2 (Complexity of Algorithm 1) There is an 0(nlog(n)) implementation of Algorithm 1. 

Proof The key idea is to introduce a data structure Tg = {fen..., t@ } consisting of 5-tuples, one for each 
group in a given partition Q = {Gi,..., G g }: 

Mi £ {1,... ,g} tGi := (rGi, min(Gj), max(Gj), S(Gi, z),S(Gi , w)) gRxNxNxR 2 


where for any vector x £ R n , we let S(G,x) = and the ratios r G are defined by 


Mi £ {1, - -, S' — 1} 


r Gi 


S(Gi,z ) S{G i+1 ,z) 

|G,I |G.+il 

S(Gi,w) S(G i+1 ,w) ’ 

\Gi\ |G, +1 | 


and re = oo. 


Notice that S(G, z ) = S(G , zg) and S(G, w) = S(G , wg). We assume that the data structure Tg maintains 2 
ordered-set views of the underlying tuples to, one of which is ordered by re, and another that is ordered by 
min(G). We also assume that the data structure allows us to convert iterators between views in constant time. 
This ensures that we can find the position of to with G £ argminjrG | G £ G} in the view ordered by tq in 
time 0(log(|£/|) and convert this to an iterator (at the tuple to) in the view ordered by min(G) in constant 
time. We also assume that the “delete,” “find,” and “insert,” operations have complexity 0(log(|t/Q). We 
note that this functionality can be implemented with the Boost Multi-Index Containers Library [11]. 

Now, the first step of Algorithm 1 is to build the data structure Tg^, which requires 0(nlog(n)) oper¬ 
ations. The remaining steps of the algorithm simply modify Tg( z ) by merging and deleting tuples. Suppose 
that Algorithm 1 terminates in K steps for some I\ £ {1,... ,n}. For i = 1 ,,K, let Q % be partition at 
the current iteration, and let mi = \Qi\. Notice that for i < K, we have Gi =4 Gi+i, so we get Gi+ i by 
merging groups in Gi, and mi > mi+ Finally, we also maintain two numbers throughout the algorithm: 
Ig i = {zg^wgf) and Ng t = Il'iegJI 2 . Given Ig i and Ng i , we can compute Ai and Ao in constant time. 

Now fixi £ {l,...,if — 1}. Suppose that we get from iteration i to i + 1 through one of the updates 
Gi- i-i = G(zg i — XiWg i ) or Gi+i = G(z.g i — AoU>g i ). We note that each of these updates to Tg. can be performed 
in at most 0((mj — m i+ i) log(mj)) steps because we call at most 0{mi — m i+ 1 ) “find”, “insert”, “delete”, 
and “merge” operations on the structure Tg t to get Tg i+1 , and at most 0(mi — uq+i) modifications to the 
variables Ig i and Ng i to get Ig i+1 and Ng i+1 . Likewise, it is easy to see that modifications of the form 
Gi+i <— Go(zgi) ean be implemented to run in 0((wi — rrij+i) log (mi)) time. 

Therefore, the total complexity of Algorithm 1 is 

O ^nlog(n) + ^(m, - m i+1 ) log(mj)^ = O ^nlog(n) + ^(mj - m i+1 ) log(n) 

= 0(nlog(n)). 


□ 


4 Numerical Results 

In this section we present some numerical experiments to demonstrate the utility of the OWL norm and test 
our C++ implementation and MATLAB MEX file wrapper. 








An 0(nlog(n)) Algorithm for Projecting Onto the Ordered Weighted l\ Norm Ball 


9 




(a) (b) 

Fig. 4.1 : We solve Problem (4.1) for d = 5,10 with Douglas-Rachford splitting (DRS) [10], Forward- 
Backward splitting (FBS) [13], and an accelerated forward-backward splitting method (dubbed FISTA [3]). 
Note that the optimal objective value is 0 because e = ^^(a^true)- In Figure 4.1b, there is a delay in the 
FBS and FISTA methods due to an initial investment in computing ||A||, which is quite expensive. The test 
was run on a PC with 32GB memory and an Intel i5-3570 CPU with Ubuntu 12.04 and Matlab R2011b. 


4.1 Synthetic Regression Test 


We adopt and slightly modify the experimental set up of [17, Section V.A]. We choose an integer d > 1, and 
generate a vector 

z true := (0, ...0,3,..., 3,0,... 0, -4,..., -4, 0,... 0, 6,..., 6,0,... 0) T G R wood 

150 d 


50 d 


250 d 


50 d 


250 d 


50 d 


250 d 


We generate a random matrix A = [Ai, ..., A 10 ood] G R looodxloo(W where the columns A % G R 1000<i follow a 
multivariate Gaussian distribution with co v(Ai,Aj) = after which the columns are standardized and 

centered. Then we generate a measurement vector b = Ax true + v where v is Gaussian noise with variance 
.01. Next we generate w with OSCAR parameters /xi = 10“ 3 and /j,^ 5 (See Equation (2.2)). Finally, we set 

£ = f2^;(x true)- 

To test our implementation, we solve the regression problem 


1 , 


minimize — ||Ax — &||‘ 

a;GR n 2 


subject to: Q w (x) < e. 
with three different proximal splitting algorithms. We plot the results in Figure 4.1. 


(4.1) 


4.2 Standalone Tests 

In Table 4.1 we display the timings for our MATLAB MEX implementation of Algorithm 1. Note that 
solutions to (4.1) can be quite sparse (although usually not as sparse as solutions to (1.2)). Thus, the iterates 
generated by algorithms that solve (4.1), such as those applied in Figure 4.1, are sparse as well. Thus, we 
test our implementation on high-dimensional vectors of varying sparsity levels. 
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Density 


length n 



10 3 

10 4 

10 5 

10 6 

100% 

3.6e-04 

5.1e-03 

6.8e-02 

1.6 

50% 

2.1e-04 

3.1e-03 

3.8e-02 

8.3e-01 

25% 

l.le-04 

1.6e-03 

2.0e-02 

3.7e-01 

10% 

5.6e-05 

8.5e-04 

1.0e-02 

1.4e-01 


Table 4.1: Average timings in seconds (over 100 runs) for random Gaussian vectors with different density 
levels (measured in percentage of nonzero entries). The test was run on a PC with 32GB memory and an 
Intel i5-3570 CPU with Ubuntu 12.04 and Matlab R2011b. 


5 Conclusion 

In this paper, we introduced an 0(nlog(n)) algorithm to project onto the OWL norm ball. Previously, there 
was no algorithm to compute this projection in a finite number of steps. We also evaluated our algorithm 
with a synthetic regression test. A C++ implementation of our algorithm with a MEX wrapper, is available 
at the authors’ website. 
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Appendix 

A Proof of Proposition 2.4 

First we prove a simple fact that we will use throughout the following proofs. Intuitively, it states that Q\ + C/2 if, and only if, 
Q2 does not split groups in Q\. 

Lemma A.l (Equivalent conditions for nested partitions) Let Q\,Qi £ Pn- Then Q\ Q2 if, and only if, for every 
i G { 1 ,..., n} such that there exists a group G 1 G Q 1 with i, i + 1 G Gi, there exists a group G2 such that i, i + 1 G G2. 

Proof (Proof of lemma) =>■ : This direction is clear by definition of +. 

<7= : Suppose that G\ = {i \,... , G Q 1 for some k > 1. If |Gi| = 1, the partition property implies there exists G 2 £ G 2 
containing G\. Suppose |Gi| > 1. For each ij with j = 1,... ,fc — 1, there exists G 2 £ G 2 with ij,ij- 1-1 G G 2 . Notice that each 
of the adjacent G J 2 sets intersect: ij G G J 2 ~ 1 fl G° 2 for j = 2,..., k — 1. Thus, by the partition property, all G J 2 are the same and 
hence, Gi C G 2 for any such j. Thus, C/i + C/ 2 - □ 

Part 1: Let i G {1,... , n}. Suppose that z% = Then Zi — Zi +1 = 0 < A (w-i — Wi+i), i.e., z% — Xwi < Zi +1 — Xwi+i. 

Therefore, by Lemma A.l, we have Q(z) =<! Q(z — Xw). 

Next, suppose that zi — Xwi < Zi+i — Xwi+i where Zi is not necessarily equal to z%+ 1 . Then i and i + 1 are in the 
same group in G(z — Xw). Thus, by Equation (2.8), we have (zg( z _\ w ))i = ( Z g( z _\ w ))i+i- Therefore, by Lemma A.l, we 
have Q(z - Xw) + G(z g ^ z _ Xw ^). Conversely, suppose that (z g ^ z _ Xw ^)i = (z g ^ z _ Xw )) i+1 , but zi - Xwi > z i+ 1 - Xw i+1 . 
Then i and i + 1 are not in the same group in Q(z — Xw) and, in particular, z^ — Zi+i > A (wi — Wi+ 1 ) > 0. Thus, because 
z E 7”, we have (z g ( z _ Xw ))i > Zi > Zi +1 > (z g ^ z _ A-u;))i-t-i? which is a contradiction. Therefore, by Lemma A.l, we have 
Q(z - A w) ^ G(zg( z -\ w )), and so Q(z - Xw) = <3{z g(z _ Xw) ). 

Finally, suppose that there exists i G { 1 ,..., n — 1 } such that Zi — Xwi < Zi +1 — Xwi+i. Then by Proposition 2 . 2 , 


X* - x* +1 := (; Zi - X* Wi) - (z i+ 1 - X*w i+1 ) + 2v* - (v*_ x + n* +1 ). 


If x* 7 ^ ^*+ 1 ? then 2v* = 0 so the expression on the left is nonpositive, which is a contradiction. Thus, x* = Therefore, 

by Lemma A.l, we have G{zg( z —\w)) = G(z — Xw) + Q*. 

Part 2: Note that 
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because x* is constant along each group G. Thus, x* G H(wg,e) D 1~. Let x° = PH(wg,e)nT( z G)' We will show that x* = x°. 
Indeed, Q = G(zg) ^ G(x®) and 


(w, x°) = J2 J2 x i( T^T J2 w i ) = x °) = £ 

GeG ieG y ' jeG J 

because a; 0 is constant along each group. Therefore, x° G H(w,e) fl 1~. In addition, for all G G G, we have cc9 = x 9 for all 
i,j G G ; let x G denote x® for any z G G. Therefore, 


\\z-x*f < ii^-x°n 2 = J2 ( z i - x °) = E (^r 5Z ( z i- z j) 2 + 

GeG ieG GeG \ 11 i,jeG ieG 

(-L 


zg — X 


< E Uin 51 (zi _ Z P + - X *Y 

GeG \ ,Cr| i,jeG ieG 

* ||2 


= \\z — x 


Thus, || z — a:*|| = ||z — ic 0 ||, so by the uniqueness of the projection, we have x° = x*. 


B Proof of Proposition 3.1 


First note that because (w,x*) = e, Proposition 2.2 implies that 


A * = Er=i wi - g+sr= i v i( w * - w i+i) +« > Ai 


(B.l) 


because v *( w i - w z+ 1 ) + v*w* > 0 . 

Part 1: Suppose r = oo. Then w is a constant vector. Thus, the result follows from Proposition 2.3. 

Part 2: Suppose that Ai > r. Then A* > r by Equation (B.l). 

Part 3: Suppose that oo > r > Ai and z n — X\w n > 0. Then z — X\w G 7~ and x° = z — X±w satisfies the conditions of 
Proposition 2.2 with v* = 0 and A* = Ai. Thus, x* = z — Ai w. 

Part 4: Suppose that oo > r > Ai, z n — X±w n < 0, and Ao > r. Then, z n — X*w n < z n — X\w n < 0. From x* = Z{ — X*w n + 
v* ~ v *—i < u*, an d v n x n = 0, we have rr* = 0. Next, because G(z) ^ Q* , we have {z | z^ = z n } C {z | x* = #* } = {z | x* =0} 
and so {z | Zi > z n } D {z | x* > 0}. Let k 0 = max{z | Zi > z n }. Therefore, from J2{i\ Zi >z n } x i Wi = > 0 } x i Wi = £ and 

Proposition 2.2, we have 


A* 


’E{i\ Zi>Zn }zjWi -g + E{i| Zi>z „}<K 

^{i\ Zi > Zn } W i 


^i+l) "t” V ko Wk 0 +l 
- > Ao > r 


(B.2) 


where we use the bound Zi > Zn } v i( w i~ w i+i)P v k 0 'Wk 0 +i > 0- Notice that x^ = 0 and the first inequality in Equation (B.2) 
holds whether or not Ao > r: we just need oo > r > Ai and z n — X\w n < 0. We will use this fact in Part 6 below. 

Part 5: Suppose that oo > r > Ai, z n — X\w n < 0, r > Ao, n' < n and z n / = z n . Then ma x{z — Xow , 0} G 7~. In addition, 
we have (w, maxfz — Xow, 0}) = e by the choice of Ao- We will now define a vector v G R!Ji recursively: If Z{ > z n , set Vi = 0; 
otherwise set Vi = Vi—\ — (Zi — AotCj). We can satisfy the optimality conditions of Proposition 2.2 with A* = Ao and v* = v. 
Thus, x* = max{z — Xow, 0}. 

Part 6 : Suppose that oo > r > Ai, z n —X\w n < 0, r > Ao, n' < n and z n / ^ z n . From the proof of Part 4 we have Zk~X*Wk < 
z k ~ Ao Wk < 0 for all k = n', n! + 1,..., n (from A* > Ao) and x* = 0. Suppose that a;*, 7 ^ x^ = 0. Let n" = min{/c | x £ = 0}. 
Then n" — 1 > n' > 1. Thus because x*,, n 7 ^ x*,, = 0, we have v*„ , = 0 and x*,, 1 = z„n — X*w r7 // -1 — v*,, 0 < 0 
(where we let u *„_ 2 = 0 if n" = 2). This is a contradiction because x* G 7”. Thus, #*, = #*/ +1 = ... = a;* = 0. If n' = 1, 
then we see that G(z) Go G*■ Furthermore, if n' > 1, then we claim that n! — 1 and n' are not in the same group in G(z), 
i.e., that z n /_i 7 ^ z n r . Indeed, if z n /_i = 2 : n /, then w n /_i = ic n / and hence, ^ n /_i — Aoic n /_ 1 = — Aoic n / < 0, which is a 

contradiction. 
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Thus, this argument has shown that G(z) = {G E G(z) | max(G) < n'} U {G E G(z) | min(G) > n'} and there exists 
G 2 E G* with {n',..., n} C G 2 . Note that the first of these identities implies that Go E 'Pn- Let us now prove the claimed 
nestings: Q(z) =4 Go = G(zg 0 ) =4 G*- 

1. ( G(z) ^4 Go): Suppose that G E G(z). If max(G) < n', then G E Go- If min(G) > n', then G C {n',.. ., n} E Go- Thus, 
G(z) *4 Go- 

2. (£0 = G(zg 0 )) : The identity follows because 


( z Q 0 )i — 


n — n' + l Z-ji=rv 


if i < n' m , 

J2i= n f Zi if i > n '- 


3. (Go =4 G*)- Suppose that G £ Go- If max(G) < n', it follows that G E G(z) and hence by Part 1 of Proposition 2.4, there is 
a G 2 E G* with G C G 2 - If min(G) > n', then G = {n',.. ., n} and there exists G 2 E G* with G C G 2 - Therefore, Go G* - 

Finally, note that \G(zg 0 )\ = \Go\ < |^(- 2: )| — 1 because z n / 7 ^ z n implies that {GE G(z) | min(G) > n'} C G(z) contains at least 
two distinct groups that are both contained in {n ',..., n} £ Go- 

Part 7: Suppose that 00 > r > Ai, z n — Xiw n < 0, r > Aq, and n' = n + 1. Then z n — \ow n > 0. Thus, Ai > Aq and 


Ai ( J2 w i+ J2 1 

{i|* 4 >*„} {i\zi = Zn} 


y ziWi + y zjWj ) - e 


< ( y ziwi - £ 1 + y Xiw i 

) {i|z ; = z„} 


= Ao ( y wl 

\ {i|zi>Zn> 


Ai y ? 

{ i\zi=z n } 


where the strict inequality follows from z n < Aiic n . Thus, Ai < Ao, which is a contradiction. 

The final conclusions of the proposition are simple consequence of Lemma 2.2, Proposition 2.4, and the 6 alternatives. 


C An Example 

In this section, we project the point zq = (3, 2,1, —1, 2) onto the OWL ball of radius e = 1 with weights wo = (5, 4, 3,1,1). 

— Preprocessing. 

- Set s := signfzo) = (1,1,1, —1,1) T ; 

- Set 2 := Q(|z 0 |)|z 0 | = (3, 2, 2,1,1) T ; 

- Set Q{z) <— {{1}, {2, 3}, {4, 5}}; 

- Set w := (wo) s(z) = (5, 7/2, 7/2,1,1) T ; 

— Iteration 1. 

- Set 

3-2 2-1 I 2 

-—, oo, —--, oo > = -: 

5-7/2 7/2-1 J 5 

- Set 


An < - and 

49.5 

— Set n ' <— 6; 

- Set Qo(z) <- G(z); 

— Test 2c passed: Ai = 31/51.5 > 2/5 = r; 

• Set G(z — Xiw) <— {{1}, {2, 3, 4, 5}}; 

. Set z <— z g(z _ Xlw) = (3, 3/2,3/2, 3/2, 3/2) T ; 

• Set w <— Wg( z _ Xlw ) = (5, 9/4, 9/4, 9/4, 9/4) T ; 

— Iteration 2. 


Ai 


31 

5L5 ’ 
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— Set 


— Set 


f 3 — 3/2 I 12 

r <— mm < -, oo, oo, oc > = —; 

15-9/4 J 22 


Ao 


14 

25 


and 


Ai <— 


27.5 

45.25’ 


— Set n' <— 6; 

- Set Go(z) <— G(z) = {{1}, {2,3,4, 5}}; 

- Test 2c passed: Ai = 27.5/45.25 > 12/22 = r; 

• Set Q(z — Aitti) t— {{1, 2,3,4, 5}}; 

. Set z <- z g(z - Alw) = (9/5, 9/5, 9/5, 9/5, 9/5) T ; 

. Set w <— w g(z -x iw ) = (14/5,14/5,14/5,14/5,14/5) T ; 

— Iteration 3. 

— Set r = oo; 

— Test 2b passed: (We use Proposition 2.3 to finish.) 

• Set A = 121/70; 

• Set x* = max{z - A, 0} = (1/14,1/14,1/14,1/14,1/14) T ; 

— Undo preprocessing. 

- Set = s © Q(\zo\) t x* = (1/14,1/14,1/14, -1/14,1/14) T ; 

— Terminate. 

- We have Pb(w 0 , £ ) (*o) = 

Notice that Xq satisfies = 1 because '52i = i(wo)i = 14. 





